Audio Visual Speaker Verification Based on Hybrid Fusion of Cross Modal Features

نویسندگان

  • Girija Chetty
  • Michael Wagner
چکیده

In this paper, we propose hybrid fusion of audio and explicit correlation features for speaker identity verification applications. Experiments were performed with the GMM based speaker models with a hybrid fusion technique involving late fusion of explicit cross-modal fusion features, with implicit eigen lip and audio MFCC features. An evaluation of the system performance with different gender specific datasets from controlled VidTIMIT data base and opportunistic UCBN database shows a significant performance improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audiovisual speaker identity verification based on cross modal fusion

In this paper, we propose the fusion of audio and explicit correlation features for speaker identity verification applications. Experiments performed with the GMM based speaker models with hybrid fusion technique involving late fusion of explicit cross-modal fusion features, with eigen lip and audio MFCC features allow a considerable improvement in EER performance An evaluation of the system pe...

متن کامل

Multifactor Fusion for Audio-Visual Speaker Recognition

In this paper we propose a multifactor hybrid fusion approach for enhancing security in audio-visual speaker verification. Speaker verification experiments conducted on two audiovisual databases, VidTIMIT and UCBN, show that multifactor hybrid fusion involve a combination feature-level fusion of lip-voice features and face-lip-voice features at score-level is indeed a powerful technique for spe...

متن کامل

Liveness Verification in Audio-Video Speaker Authentication

In this paper we propose liveness verification for audio-video speaker authentication systems to guard against possible replay attacks that employ pre-recorded audio and/or still face samples from a photo. The technique uses a fusion of acoustic features and visual features, automatically extracted from the lip region, to differentiate synchronous audio-video presentations from asynchronous rep...

متن کامل

UCBN: A new audio-visual broadcast news corpus for multimodal speaker verification studies

The performance of face, voice, and multimodal speaker verification systems in complex and non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust multi-modal speaker recognition systems, a new multi-modal (audio-visual) Australian broadcast UCBN (University of Canberra Broadcast News) corpus was...

متن کامل

Audio-visual multilevel fusion for speech and speaker recognition

In this paper we propose a robust audio-visual speech-andspeaker recognition system with liveness checks based on audio-visual fusion of audio-lip motion and depth features. The liveness verification feature added here guards the system against advanced spoofing attempts such as manufactured or replayed videos. For visual features, a new tensor-based representation of lip motion features, extra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007